Supervised AI Model Training 
============================

This tab facilitates feature selection, data partitioning, data balancing, and training supervised methods for tissue type classification.

Dataset Import 
--------------
Click on **Select file** under "Import CSV dataset". A dialog box will appear allowing you to browse your local storage and select the desired CSV file. After loading, a success message will be shown and dataset information will be displayed, including the number of slides, number of classes, total spectra, and spectra per class.

Feature Ranking
---------------
MassVision provides users with multiple feature ranking options. The hyperparameters associated with each method can also be tuned according to the user’s needs. Current methods include:

    - Linear Support Vector Classification
    - Partial Least Squares Discriminant Analysis
    - Linear Discriminant Analysis

Feature Selection
-----------------
Users can control which ions (features) are included in the classification step by choosing from several feature selection options:

    - **None:** Use the complete set of ions in the dataset without restriction.
    - **Top ranked:** Apply one of the available feature ranking algorithms and specify the number of top-ranked ions to retain. This allows classification to focus on the most informative features.
    - **Manual:** Upload a CSV file containing a single column with the indices of hand-picked ions. Only these ions will be used for classification.

Model Training/Validation
-------------------------

Model
*****
Select the AI model from the available list under **Model type** and set the hyperparameters according to your research needs. The available classification models include:
    
    - Principal Component Analysis followed by Linear Discriminant Analysis
    - Linear Support Vector Classification
    - Random Forest
    - Partial Least Squares Discriminant Analysis

Data partitioning
*****************
Choose the data partitioning configuration from **Data split scheme** to determine how data is divided for training and validation. Available options include:
  
    - **Training on whole dataset:** use the entire dataset for training and report performance measures on the training set.
    - **Random train/test split:** randomly divide the data into training and test sets, and report performance measures for both.
    - **Slide-based train/test customization:** manually select which slides/patients are included in the training or test sets using the provided checkboxes. Performance measures are reported for both sets.
    - **Leave-one-slide-out cross-validation:** run the slide-based customization iteratively, leaving one patient/slide out as the test set each time, and report the average performance metrics across folds for both training and test sets.

Data balancing
**************
To mitigate biases from imbalanced training data, MassVision supports three class-based balancing strategies available in the **Data balancing** dropdown:
    
    - **None:** no balancing is applied. 
    - **Undersampling:** randomly exclude spectra from majority classes until each class has the same number of spectra as the minority class.
    - **Oversampling:** randomly replicate spectra from minority classes until each class reaches the number of spectra in the majority class.
    - **Hybrid:** up-sample minority classes and down-sample majority classes to the average number of spectra per class.

Train/validate
**************
Once you are satisfied with the parameters, click **Train and validate** at the bottom of the tab to start training.  
If the **Export model pipeline** box is checked, a dialog will appear prompting you to specify the name and location for saving.  

After training, you will be redirected to the **Performance Report** tab, where you can review details of the training and test data distribution, performance measures, and—if applicable—visualizations such as LDA scatter plots. 

.. important::
   To save the trained classification pipeline for later use on whole-MSI data, be sure to check the box for **Export model pipeline**.